Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify TF AWS provider configuration logic #1204

Merged
merged 3 commits into from
Mar 14, 2024

Conversation

erhancagirici
Copy link
Collaborator

@erhancagirici erhancagirici commented Mar 12, 2024

Description of your changes

refactors AWS client configuration logic with a single path.

Previously, the AWS config was following various paths due to:

  • Terraform Provider Configuration does not support AssumeRoleChain with lengths > 1
  • Terraform CLI was using the shared scheduler, which required configuration push-down
  • eks.clusterAuth resource has a manual controller implementation, since it is not terraform-based
    To handle these paths, a native AWS configuration was being constructed along with the terraform configuration attributes.
    These were causing in a non-unified configuration path, and duplicated AWS STS API calls due to configuration being processed by the native method and Terraform provider configure for no-fork.

Since the TF CLI is removed all relevant configuration paths can be removed to simplify.

This PR:

  • removal of Terraform configuration block preparation for CLI
    • TerraformSetup.Configuration is no more present
    • no push-down configuration
  • AWS configuration and credential resolution is solely handled by native AWS config.
    • All auth methods resolves down to the final AccessKey and Secret key (temporary) credentials
    • other configuration such as AssumeRoleChain and custom endpoints are also handled by the AWS config
  • move away from native TF Provider's Configure() method
    • this method had a potential to introduce unintended side effects and bugs, due to its singleton nature.
    • instead, the native TF configuration object is directly filled using the already-obtained native AWS config & credentials.
    • This reduces the extra AWS STS calls for GetCallerIdentity and AssumeRole operations.

As a result this leads to 50% reduction in STS calls made per reconcile. Further improvement to STS calls will follow with a cache implementation.

I have:

  • Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

Tested manually with following ProviderConfig specifications:

  • Static creds from secret
  • Static creds from secret with AssumeRoleChain
  • WebIdentity
  • WebIdentity with AssumeRoleChain
  • IRSA
  • IRSA with AssumeRoleChain
  • Upbound Auth (thanks @turkenf for the test)

These copies are not needed anymore after refactoring the AWS configuration logic
to not use native provider Configure()

Signed-off-by: Erhan Cagirici <[email protected]>
@erhancagirici erhancagirici force-pushed the refactor-aws-config-single-path branch from 22ae56a to 1fabbe6 Compare March 13, 2024 10:22
config/registry.go Outdated Show resolved Hide resolved
@ulucinar
Copy link
Collaborator

ulucinar commented Mar 13, 2024

The issue with the publish-artifacts job is a known one with the recent SSO changes. cc. @turkenf

@erhancagirici erhancagirici marked this pull request as ready for review March 13, 2024 17:11
Copy link
Collaborator

@ulucinar ulucinar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @erhancagirici for cleaning up the AWS ProviderConfig code base and improving the situation with the number of per-MR STS calls the provider makes . Could you please also update the PR description with:

  • The details of the changes this PR proposes
  • The manual tests you did to verify these changes
  • STS calls saved with these changes

We can also discuss briefly in the PR description the next iteration of planned improvements, where we would like to introduce a credential cache to further decrease the number of STS calls we make.

Makefile Show resolved Hide resolved
internal/clients/aws.go Outdated Show resolved Hide resolved
internal/clients/provider_config.go Outdated Show resolved Hide resolved
internal/clients/provider_config.go Outdated Show resolved Hide resolved
internal/clients/provider_config.go Outdated Show resolved Hide resolved
internal/clients/aws.go Outdated Show resolved Hide resolved
internal/clients/aws.go Outdated Show resolved Hide resolved
// specified managed resource and produces a config that can be used to
// authenticate to AWS and tracks the ProviderConfigUsage. Useful for obtaining
// AWS config for non-upjet based MR controllers.
func GetAWSConfigWithProviderUsage(ctx context.Context, c client.Client, mg resource.Managed) (*aws.Config, error) { // nolint:gocyclo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it will help simplifying this package if we just move GetAWSConfigWithProviderUsage to the ClusterAuth.eks controller package as currently, it's the only controller who uses this. If, in the future, we add another controller that needs this, then we can move this to a command module. What do you think?

internal/clients/aws.go Outdated Show resolved Hide resolved
internal/clients/aws.go Show resolved Hide resolved
@erhancagirici erhancagirici force-pushed the refactor-aws-config-single-path branch 3 times, most recently from f148c02 to 50a5d83 Compare March 14, 2024 12:02
@erhancagirici erhancagirici force-pushed the refactor-aws-config-single-path branch from 50a5d83 to 481dde7 Compare March 14, 2024 12:11
@erhancagirici erhancagirici requested a review from ulucinar March 14, 2024 12:29
@ulucinar ulucinar changed the title unify TF AWS provider configuration logic Unify TF AWS provider configuration logic Mar 14, 2024
Copy link
Collaborator

@ulucinar ulucinar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @erhancagirici, lgtm.

@ulucinar
Copy link
Collaborator

The publish-artifacts job is expected to fail due to the recent Upbound SSO changes. We will need to fix the credentials related to that job.

@erhancagirici erhancagirici merged commit 7b8f9e4 into main Mar 14, 2024
10 of 11 checks passed
@ulucinar ulucinar deleted the refactor-aws-config-single-path branch March 14, 2024 13:40
@mbbush mbbush mentioned this pull request Mar 16, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants